Expert
Advice KnowledgeMiner
Discussion Forum Easy
Learning FAQs Q: What does GMDH stand for? A: GMDH - Group Method of Data Handling. It is a statistical learning network technology using the cybernetical approach of self-organization including systems, information and control theory and computer science. GMDH is not a traditional statistical modeling method. It is an interdisciplinary approach to overcome some main disadvantages of statistics and NN's. Below is a description of GMDH from the preface to Farlow's Book. "In statistics nowadays there is a distinguishable trend away from the restrictive assumptions of parametric analysis and toward the more computer-oriented area of what is generally known as nonparametric data analysis. One of the more fascinating concepts from this new generation of research is what is known as the GMDH algorithm, which was introduced and is currently being developed by the Ukrainian cyberneticist and engineer A.G Ivakhnenko. What is known these days as a heuristic, the GMDH algorithm constructs high-order regression-type models for complex systems and has the advantage over traditional modeling in that the modeler can more-or-less throw into the algorithm all sorts of input/ output types of observations, and the computer does the rest. The computer self-organizes the model from a simple one to one of optimal complexity by a methodology not unlike the process of natural evolution. It is the purpose of this book to introduce to English-speaking people the basic GMDH algorithm, present variations and examples of its use and list a bibliography of all published work in this growing area of research." S. J. Farlow, Self-Organizing methods in Modeling. GMDH Type Algorithm (1984) You can find a short intro in Paper
1 (section Self-organizing modeling technologies) on our
web site. You may also want to look at the publications
area for more information. A: Yes, this is one of the primary application
fields for KnowledgeMiner. In contrast to statistics or NN's
you can use more variables than samples available for
modeling. For example, you can create a prediction model
(lin. system of equations e.g.) of 40 variables, but only 30
observations for each variable are available. You can
consider up to 500 input variables (lagged and unlagged) in
KnowledgeMiner to model complex time processes.
Additionally, KnowledgeMiner has implemented Analog
Complexing as an extremely powerful prediction technique for
fuzzy processes like financial markets. KnowledgeMiner when
used on financial markets could really strike gold! A: Yes, exactly. This is something KnowledgeMiner
can do. A: Yes, you are correct. One important feature of
KnowledgeMiner is that it creates models in an evolutionary
way: From very simple models to increasingly more complex
ones. It stops automatically when an optimally complex model
is found. That is, when it begins to overfit the design data
(the data used to create relationships between
variables). A: The same is true if you want to create a
dynamic model. In contrast to statistics or Neural Networks,
KnowledgeMiner can deal with a very small number of cases
(6+). In fact, the number of cases used for modeling can be
smaller than the number of variables (so-called
under-determined tasks). So, it is really possible for you
to use 10 variables and 6-10 samples only for creation of a
linear system of equations. A: The table contains approximated values as an
orientation:
KnowledgeMiner optimizes several modeling tasks, so it is
not possible to give exact values in advance. The real
memory requirements may actually be smaller. A: Two aspects: speed and RAM space. 180 MHz are
good even for large problems. For small modeling problems
(< 50 inputs and < 100 samples) it will take a few
minutes and let's say 100KB-2MB of RAM temporarily to create
a GMDH model (once familiar with it). However, especially
RAM requirements will grow rapidly (10-100MB and more) with
larger modeling problems (>100 inputs and > 500
samples). It can take then up to an hour or two to get a
model. Compared to alternative methods with this kind of
problem complexity, which would take days or weeks. A: No, KM is "Un-PC" too! It can handle up to 500
inputs (including lagged variables for dynamic modeling) and
a virtually unlimited number of outputs (read: models) in a
single document using the same physical data sheet without
copying/pasting any data. All models are stored in a model
base and for each column of the sheet, 4 different model
types can be created and stored simultaneously: a time
series model (auto-regressive), an input-output model
(static or dynamic), a system model
(multi-input/multi-output) and an Analog Complexing
model. A: This has been described a little elsewhere in this FAQ. An important advantage is also that KM always produces a model description usable for interpretation and analysis. You can see why results are as they are and what variables KM has selected out as relevant. For fuzzy-rule induction, for example, you will get models in an almost natural language as this model from the wine recognition example shows: IF N_Flavanoids &
NOT_N_Nonflavanoid phenols & NOT_N_Color
intensity The main difference, however, is that KM, in addition to
the black-box approach and the connectionism of NNs, is
based on a third principle called inductive
self-organization. |
Contact:
knowledgeminer@iworld.to
julian@scriptsoftware.com
Date Last Modified: 03/23/99